Convolutional Neural Networks

Project: Write an Algorithm for Landmark Classification


In this notebook, some template code has already been provided for you, and you will need to implement additional functionality to successfully complete this project. You will not need to modify the included code beyond what is requested. Sections that begin with '(IMPLEMENTATION)' in the header indicate that the following block of code will require additional functionality which you must provide. Instructions will be provided for each section, and the specifics of the implementation are marked in the code block with a 'TODO' statement. Please be sure to read the instructions carefully!

Note: Once you have completed all the code implementations, you need to finalize your work by exporting the Jupyter Notebook as an HTML document. Before exporting the notebook to HTML, all the code cells need to have been run so that reviewers can see the final implementation and output. You can then export the notebook by using the menu above and navigating to File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question X' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. Markdown cells can be edited by double-clicking the cell to enter edit mode.

The rubric contains optional "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. If you decide to pursue the "Stand Out Suggestions", you should include the code in this Jupyter notebook.


Why We're Here

Photo sharing and photo storage services like to have location data for each photo that is uploaded. With the location data, these services can build advanced features, such as automatic suggestion of relevant tags or automatic photo organization, which help provide a compelling user experience. Although a photo's location can often be obtained by looking at the photo's metadata, many photos uploaded to these services will not have location metadata available. This can happen when, for example, the camera capturing the picture does not have GPS or if a photo's metadata is scrubbed due to privacy concerns.

If no location metadata for an image is available, one way to infer the location is to detect and classify a discernable landmark in the image. Given the large number of landmarks across the world and the immense volume of images that are uploaded to photo sharing services, using human judgement to classify these landmarks would not be feasible.

In this notebook, you will take the first steps towards addressing this problem by building models to automatically predict the location of the image based on any landmarks depicted in the image. At the end of this project, your code will accept any user-supplied image as input and suggest the top k most relevant landmarks from 50 possible landmarks from across the world. The image below displays a potential sample output of your finished project.

Sample landmark classification output

The Road Ahead

We break the notebook into separate steps. Feel free to use the links below to navigate the notebook.

  • Step 0: Download Datasets and Install Python Modules
  • Step 1: Create a CNN to Classify Landmarks (from Scratch)
  • Step 2: Create a CNN to Classify Landmarks (using Transfer Learning)
  • Step 3: Write Your Landmark Prediction Algorithm

Step 0: Download Datasets and Install Python Modules

Note: if you are using the Udacity workspace, YOU CAN SKIP THIS STEP. The dataset can be found in the /data folder and all required Python modules have been installed in the workspace.

Download the landmark dataset. Unzip the folder and place it in this project's home directory, at the location /landmark_images.

Install the following Python modules:

  • cv2
  • matplotlib
  • numpy
  • PIL
  • torch
  • torchvision

Step 1: Create a CNN to Classify Landmarks (from Scratch)

In this step, you will create a CNN that classifies landmarks. You must create your CNN from scratch (so, you can't use transfer learning yet!), and you must attain a test accuracy of at least 20%.

Although 20% may seem low at first glance, it seems more reasonable after realizing how difficult of a problem this is. Many times, an image that is taken at a landmark captures a fairly mundane image of an animal or plant, like in the following picture.

Bird in Haleakalā National Park

Just by looking at that image alone, would you have been able to guess that it was taken at the Haleakalā National Park in Hawaii?

An accuracy of 20% is significantly better than random guessing, which would provide an accuracy of just 2%. In Step 2 of this notebook, you will have the opportunity to greatly improve accuracy by using transfer learning to create a CNN.

Remember that practice is far ahead of theory in deep learning. Experiment with many different architectures, and trust your intuition. And, of course, have fun!

(IMPLEMENTATION) Specify Data Loaders for the Landmark Dataset

Use the code cell below to create three separate data loaders: one for training data, one for validation data, and one for test data. Randomly split the images located at landmark_images/train to create the train and validation data loaders, and use the images located at landmark_images/test to create the test data loader.

Note: Remember that the dataset can be found at /data/landmark_images/ in the workspace.

All three of your data loaders should be accessible via a dictionary named loaders_scratch. Your train data loader should be at loaders_scratch['train'], your validation data loader should be at loaders_scratch['valid'], and your test data loader should be at loaders_scratch['test'].

You may find this documentation on custom datasets to be a useful resource. If you are interested in augmenting your training and/or validation data, check out the wide variety of transforms!

In [1]:
import torch 
import numpy as np
import os

goGPU = torch.cuda.is_available() #Checking if GPU available 

print ('on GPU') if goGPU else print('On CPU')
on GPU
In [2]:
#Data mentioned in the instructions is not available in the workspace. Downloading data from https://udacity-dlnfd.s3-us-west-1.amazonaws.com/datasets/landmark_images.zip
import requests, zipfile, io

url = 'https://udacity-dlnfd.s3-us-west-1.amazonaws.com/datasets/landmark_images.zip'
destination_folder = 'data'

if not (os.path.exists(destination_folder)):
    print('Data not available in workspace, downloading...')
    r = requests.get(url)
    z = zipfile.ZipFile(io.BytesIO(r.content))
    z.extractall(destination_folder)
    print('Download complete.')
else:
    print('Data available in workspace')
    
Data available in workspace
In [5]:
### TODO: Write data loaders for training, validation, and test sets
## Specify appropriate transforms, and batch_sizes
from torchvision import transforms, models
from torch.utils.data.sampler import SubsetRandomSampler
from torchvision import datasets
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

batch_size = 30
validation_set_size = 0.2 #How much of training set is for validation

#Data location
data_location = '/data/landmark_images/'
train_dataset_location, test_dataset_location = os.path.join(data_location, 'train'), os.path.join(data_location, 'test')

#Transforms compose
train_transform = transforms.Compose([
    transforms.Resize([224,224]),
    transforms.RandomVerticalFlip(),
    transforms.RandomHorizontalFlip(),
    transforms.ColorJitter(brightness=0.4, contrast=0.2, saturation=0.2, hue=0.2),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5 , 0.5, 0.5))
])

test_transform = transforms.Compose([
    transforms.Resize([224,224]),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5 , 0.5, 0.5))
])


train_data = datasets.ImageFolder(train_dataset_location, transform=train_transform)
test_data = datasets.ImageFolder(test_dataset_location, transform=test_transform)

#Splitting indices for validation
num_train = len(train_data)
indices = list(range(num_train))
np.random.shuffle(indices)
split = int(np.floor(validation_set_size * num_train))
train_idx, valid_idx = indices[split:], indices[:split]

#Samplers
train_sampler = SubsetRandomSampler(train_idx)
validation_sampler = SubsetRandomSampler(valid_idx)

#Data Loaders

train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=train_sampler)
validation_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, sampler=validation_sampler)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size)


loaders_scratch = {'train': train_loader, 'valid': validation_loader, 'test': test_loader}
In [4]:
print(f'Number of images for training/validation is {len(train_data)} and for testing is {len(test_data)}')
print(f'Total batches for training are {len(train_loader)}, for validation {len(validation_loader)} and for testing {len(test_loader)}')
Number of images for training/validation is 4996 and for testing is 1250
Total batches for training are 134, for validation 34 and for testing 42

Question 1: Describe your chosen procedure for preprocessing the data.

  • How does your code resize the images (by cropping, stretching, etc)? What size did you pick for the input tensor, and why?
  • Did you decide to augment the dataset? If so, how (through translations, flips, rotations, etc)? If not, why not?

Answer:

The original pictures are 600x800. The images loaded to the datasets are being resized to 224x224. This selection was done based on what ResNet or VGG expect which is an input of 224 x 224 and it makes sense to follow that path.

Data Augmentation was applied to the training set as it is meant to get better performance. No Data Augmentation done on the test dataset.

(IMPLEMENTATION) Visualize a Batch of Training Data

Use the code cell below to retrieve a batch of images from your train data loader, display at least 5 images simultaneously, and label each displayed image with its class name (e.g., "Golden Gate Bridge").

Visualizing the output of your data loader is a great way to ensure that your data loading and preprocessing are working as expected.

In [12]:
classes = [str(x) for x in train_data.classes]
classes
Out[12]:
['00.Haleakala_National_Park',
 '01.Mount_Rainier_National_Park',
 '02.Ljubljana_Castle',
 '03.Dead_Sea',
 '04.Wroclaws_Dwarves',
 '05.London_Olympic_Stadium',
 '06.Niagara_Falls',
 '07.Stonehenge',
 '08.Grand_Canyon',
 '09.Golden_Gate_Bridge',
 '10.Edinburgh_Castle',
 '11.Mount_Rushmore_National_Memorial',
 '12.Kantanagar_Temple',
 '13.Yellowstone_National_Park',
 '14.Terminal_Tower',
 '15.Central_Park',
 '16.Eiffel_Tower',
 '17.Changdeokgung',
 '18.Delicate_Arch',
 '19.Vienna_City_Hall',
 '20.Matterhorn',
 '21.Taj_Mahal',
 '22.Moscow_Raceway',
 '23.Externsteine',
 '24.Soreq_Cave',
 '25.Banff_National_Park',
 '26.Pont_du_Gard',
 '27.Seattle_Japanese_Garden',
 '28.Sydney_Harbour_Bridge',
 '29.Petronas_Towers',
 '30.Brooklyn_Bridge',
 '31.Washington_Monument',
 '32.Hanging_Temple',
 '33.Sydney_Opera_House',
 '34.Great_Barrier_Reef',
 '35.Monumento_a_la_Revolucion',
 '36.Badlands_National_Park',
 '37.Atomium',
 '38.Forth_Bridge',
 '39.Gateway_of_India',
 '40.Stockholm_City_Hall',
 '41.Machu_Picchu',
 '42.Death_Valley_National_Park',
 '43.Gullfoss_Falls',
 '44.Trevi_Fountain',
 '45.Temple_of_Heaven',
 '46.Great_Wall_of_China',
 '47.Prague_Astronomical_Clock',
 '48.Whitby_Abbey',
 '49.Temple_of_Olympian_Zeus']
In [17]:
import matplotlib.pyplot as plt
%matplotlib inline

## TODO: visualize a batch of the train data loader

## the class names can be accessed at the `classes` attribute
## of your dataset object (e.g., `train_dataset.classes`)

def show_img(img):
    img = img /2 +0.5
    plt.imshow(np.transpose(img, (1,2,0)))

    
images, labels = next(iter(train_loader))
images = images.numpy()
    
fig = plt.figure(figsize=(30,20))

#Showing first 30 images
for idx, image in enumerate(images[0:30]):
    ax = fig.add_subplot(5,6, idx+1, xticks=[],yticks=[])
    show_img(images[idx])
    ax.set_title(train_data.classes[idx], wrap=True)
    

Initialize use_cuda variable

In [ ]:
# useful variable that tells us whether we should use the GPU
#use_cuda = torch.cuda.is_available()

(IMPLEMENTATION) Specify Loss Function and Optimizer

Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_scratch, and fill in the function get_optimizer_scratch below.

In [10]:
## TODO: select loss function
criterion_scratch = nn.CrossEntropyLoss()

def get_optimizer_scratch(model):
    ## TODO: select and return an optimizer
    return optim.SGD(model.parameters(), lr=0.01)
    

(IMPLEMENTATION) Model Architecture

Create a CNN to classify images of landmarks. Use the template in the code cell below.

In [12]:
# define the CNN architecture
class Net(nn.Module):
    ## TODO: choose an architecture, and complete the class
    def __init__(self):
        super(Net, self).__init__()
        
        ## Define layers of a CNN
        #Conv Layers
        self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
        self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
        self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
        #Max Pool
        self.maxpool = nn.MaxPool2d(2,2)
        #Dropout
        self.dropout = nn.Dropout(0.25)
        #Linear
        self.fc1 = nn.Linear(64*28*28, 1024)
        self.fc2 = nn.Linear(1024, 512)
        self.fc3 = nn.Linear(512,len(classes)) #Number of classes
        
        
        
    
    def forward(self, x):
        ## Define forward behavior
        x = self.maxpool(F.relu(self.conv1(x)))
        x = self.maxpool(F.relu(self.conv2(x)))
        x = self.maxpool(F.relu(self.conv3(x)))
        #Flattening
        x = x.view(-1, 64*28*28)
        x = self.dropout(x)
        x = F.leaky_relu(self.fc1(x))
        x = self.dropout(x)
        x = F.leaky_relu(self.fc2(x))
        x = self.fc3(x)  
        
        return x

#-#-# Do NOT modify the code below this line. #-#-#

# instantiate the CNN
model_scratch = Net()

# move tensors to GPU if CUDA is available
if goGPU:
    print('Training on GPU')
    model_scratch.cuda()
else:
    print('Training on CPU')
Training on GPU
In [13]:
model_scratch
Out[13]:
Net(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (dropout): Dropout(p=0.25)
  (fc1): Linear(in_features=50176, out_features=1024, bias=True)
  (fc2): Linear(in_features=1024, out_features=512, bias=True)
  (fc3): Linear(in_features=512, out_features=50, bias=True)
)

Question 2: Outline the steps you took to get to your final CNN architecture and your reasoning at each step.

Answer:

Inspired by the lessons in this CNN course, I implemented a Neural Network with 3 Convolutional Layers and after each conv layer, there is Max Pooling that will divide the feature maps by half:

224x224 => 112x112 => 56x56 => 28x28

Each Conv layer is passed through ReLu Activation function.

The tensor is then flattened and passed to a fully connected layer. Each layer goes through a leaky ReLu which behaves differently than normal ReLU.ReLu returns a value from f(x) = max(0, x) and takes zero value if values are negative. This can lead to dead neurons. With leaky ReLU this does not happen, the function is f(x) = max(0.001x, x). Leaky ReLU should perform better than ReLU. To avoid overfitting, dropout is applied.

(IMPLEMENTATION) Implement the Training Algorithm

Implement your training algorithm in the code cell below. Save the final model parameters at the filepath stored in the variable save_path.

In [65]:
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    """returns trained model"""
    # initialize tracker for minimum validation loss
    valid_loss_min = np.Inf 
    
    for epoch in range(1, n_epochs+1):
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        ###################
        # train the model #
        ###################
        # set the module to training mode
        model.train()
        for batch_idx, (data, target) in enumerate(loaders['train']):
            # move to GPU
            if goGPU:
                data, target = data.cuda(), target.cuda()

            ## TODO: find the loss and update the model parameters accordingly
            ## record the average training loss, using something like
            ## train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - train_loss))
            optimizer.zero_grad() #Clear Gradients 
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - train_loss))
                

        ######################    
        # validate the model #
        ######################
        # set the model to evaluation mode
        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['valid']):
            # move to GPU
            if goGPU:
                data, target = data.cuda(), target.cuda()

            ## TODO: update average validation loss 
            output = model(data)
            loss = criterion(output, target)
            valid_loss = valid_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - valid_loss))

            
        train_loss = train_loss/len(train_loader)
        valid_loss = valid_loss/len(validation_loader)

        # print training/validation statistics 
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
            epoch, 
            train_loss,
            valid_loss
            ))

        ## TODO: if the validation loss has decreased, save the model at the filepath stored in save_path
        if (valid_loss <= valid_loss_min):
            print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
            valid_loss_min,
            valid_loss))
            torch.save(model.state_dict(), save_path)
            valid_loss_min = valid_loss
        
        
    return model

(IMPLEMENTATION) Experiment with the Weight Initialization

Use the code cell below to define a custom weight initialization, and then train with your weight initialization for a few epochs. Make sure that neither the training loss nor validation loss is nan.

Later on, you will be able to see how this compares to training with PyTorch's default weight initialization.

In [12]:
def custom_weight_init(m):
    ## TODO: implement a weight initialization strategy
    #https://androidkt.com/initialize-weight-bias-pytorch/
    #https://pytorch.org/cppdocs/api/function_namespacetorch_1_1nn_1_1init_1a5e807af188fc8542c487d50d81cb1aa1.html
    if isinstance(m, nn.Conv2d):
        nn.init.kaiming_uniform_(m.weight.data,nonlinearity='leaky_relu')
    elif isinstance(m, nn.Linear):
        nn.init.kaiming_uniform_(m.weight.data)
        nn.init.constant_(m.bias.data, 0)

#-#-# Do NOT modify the code below this line. #-#-#
    
model_scratch.apply(custom_weight_init)
model_scratch = train(20, loaders_scratch, model_scratch, get_optimizer_scratch(model_scratch),
                      criterion_scratch, goGPU, 'ignore.pt')
Epoch: 1 	Training Loss: 0.029514 	Validation Loss: 0.113251
Validation loss decreased (inf --> 0.113251).  Saving model ...
Epoch: 2 	Training Loss: 0.028154 	Validation Loss: 0.111991
Validation loss decreased (0.113251 --> 0.111991).  Saving model ...
Epoch: 3 	Training Loss: 0.027412 	Validation Loss: 0.109013
Validation loss decreased (0.111991 --> 0.109013).  Saving model ...
Epoch: 4 	Training Loss: 0.026624 	Validation Loss: 0.107198
Validation loss decreased (0.109013 --> 0.107198).  Saving model ...
Epoch: 5 	Training Loss: 0.026064 	Validation Loss: 0.106568
Validation loss decreased (0.107198 --> 0.106568).  Saving model ...
Epoch: 6 	Training Loss: 0.025238 	Validation Loss: 0.104422
Validation loss decreased (0.106568 --> 0.104422).  Saving model ...
Epoch: 7 	Training Loss: 0.024664 	Validation Loss: 0.100792
Validation loss decreased (0.104422 --> 0.100792).  Saving model ...
Epoch: 8 	Training Loss: 0.023900 	Validation Loss: 0.102304
Epoch: 9 	Training Loss: 0.023374 	Validation Loss: 0.100503
Validation loss decreased (0.100792 --> 0.100503).  Saving model ...
Epoch: 10 	Training Loss: 0.022843 	Validation Loss: 0.107049
Epoch: 11 	Training Loss: 0.022093 	Validation Loss: 0.097971
Validation loss decreased (0.100503 --> 0.097971).  Saving model ...
Epoch: 12 	Training Loss: 0.021494 	Validation Loss: 0.098333
Epoch: 13 	Training Loss: 0.020709 	Validation Loss: 0.101588
Epoch: 14 	Training Loss: 0.020066 	Validation Loss: 0.101861
Epoch: 15 	Training Loss: 0.019374 	Validation Loss: 0.096805
Validation loss decreased (0.097971 --> 0.096805).  Saving model ...
Epoch: 16 	Training Loss: 0.018604 	Validation Loss: 0.101332
Epoch: 17 	Training Loss: 0.017776 	Validation Loss: 0.093195
Validation loss decreased (0.096805 --> 0.093195).  Saving model ...
Epoch: 18 	Training Loss: 0.017017 	Validation Loss: 0.099286
Epoch: 19 	Training Loss: 0.016401 	Validation Loss: 0.099320
Epoch: 20 	Training Loss: 0.015461 	Validation Loss: 0.147257

(IMPLEMENTATION) Train and Validate the Model

Run the next code cell to train your model.

In [19]:
## TODO: you may change the number of epochs if you'd like,
## but changing it is not required
num_epochs = 40 

#-#-# Do NOT modify the code below this line. #-#-#

# function to re-initialize a model with pytorch's default weight initialization
def default_weight_init(m):
    reset_parameters = getattr(m, 'reset_parameters', None)
    if callable(reset_parameters):
        m.reset_parameters()

# reset the model parameters
model_scratch.apply(default_weight_init)

# train the model
model_scratch = train(num_epochs, loaders_scratch, model_scratch, get_optimizer_scratch(model_scratch), 
                      criterion_scratch, goGPU, 'model_scratch.pt')
Epoch: 1 	Training Loss: 0.029197 	Validation Loss: 0.115044
Validation loss decreased (inf --> 0.115044).  Saving model ...
Epoch: 2 	Training Loss: 0.029186 	Validation Loss: 0.115021
Validation loss decreased (0.115044 --> 0.115021).  Saving model ...
Epoch: 3 	Training Loss: 0.029168 	Validation Loss: 0.114982
Validation loss decreased (0.115021 --> 0.114982).  Saving model ...
Epoch: 4 	Training Loss: 0.029134 	Validation Loss: 0.114839
Validation loss decreased (0.114982 --> 0.114839).  Saving model ...
Epoch: 5 	Training Loss: 0.029038 	Validation Loss: 0.114499
Validation loss decreased (0.114839 --> 0.114499).  Saving model ...
Epoch: 6 	Training Loss: 0.028871 	Validation Loss: 0.113662
Validation loss decreased (0.114499 --> 0.113662).  Saving model ...
Epoch: 7 	Training Loss: 0.028675 	Validation Loss: 0.112750
Validation loss decreased (0.113662 --> 0.112750).  Saving model ...
Epoch: 8 	Training Loss: 0.028412 	Validation Loss: 0.113125
Epoch: 9 	Training Loss: 0.028202 	Validation Loss: 0.111348
Validation loss decreased (0.112750 --> 0.111348).  Saving model ...
Epoch: 10 	Training Loss: 0.027938 	Validation Loss: 0.110428
Validation loss decreased (0.111348 --> 0.110428).  Saving model ...
Epoch: 11 	Training Loss: 0.027607 	Validation Loss: 0.109576
Validation loss decreased (0.110428 --> 0.109576).  Saving model ...
Epoch: 12 	Training Loss: 0.027508 	Validation Loss: 0.109378
Validation loss decreased (0.109576 --> 0.109378).  Saving model ...
Epoch: 13 	Training Loss: 0.027293 	Validation Loss: 0.108602
Validation loss decreased (0.109378 --> 0.108602).  Saving model ...
Epoch: 14 	Training Loss: 0.027018 	Validation Loss: 0.107507
Validation loss decreased (0.108602 --> 0.107507).  Saving model ...
Epoch: 15 	Training Loss: 0.026651 	Validation Loss: 0.107055
Validation loss decreased (0.107507 --> 0.107055).  Saving model ...
Epoch: 16 	Training Loss: 0.026574 	Validation Loss: 0.106808
Validation loss decreased (0.107055 --> 0.106808).  Saving model ...
Epoch: 17 	Training Loss: 0.026237 	Validation Loss: 0.106716
Validation loss decreased (0.106808 --> 0.106716).  Saving model ...
Epoch: 18 	Training Loss: 0.026002 	Validation Loss: 0.105181
Validation loss decreased (0.106716 --> 0.105181).  Saving model ...
Epoch: 19 	Training Loss: 0.025709 	Validation Loss: 0.107657
Epoch: 20 	Training Loss: 0.025471 	Validation Loss: 0.104407
Validation loss decreased (0.105181 --> 0.104407).  Saving model ...
Epoch: 21 	Training Loss: 0.025172 	Validation Loss: 0.104337
Validation loss decreased (0.104407 --> 0.104337).  Saving model ...
Epoch: 22 	Training Loss: 0.024905 	Validation Loss: 0.104480
Epoch: 23 	Training Loss: 0.024551 	Validation Loss: 0.102351
Validation loss decreased (0.104337 --> 0.102351).  Saving model ...
Epoch: 24 	Training Loss: 0.024283 	Validation Loss: 0.099776
Validation loss decreased (0.102351 --> 0.099776).  Saving model ...
Epoch: 25 	Training Loss: 0.024068 	Validation Loss: 0.098677
Validation loss decreased (0.099776 --> 0.098677).  Saving model ...
Epoch: 26 	Training Loss: 0.023761 	Validation Loss: 0.101413
Epoch: 27 	Training Loss: 0.023560 	Validation Loss: 0.097605
Validation loss decreased (0.098677 --> 0.097605).  Saving model ...
Epoch: 28 	Training Loss: 0.023196 	Validation Loss: 0.103888
Epoch: 29 	Training Loss: 0.023079 	Validation Loss: 0.103356
Epoch: 30 	Training Loss: 0.022845 	Validation Loss: 0.109154
Epoch: 31 	Training Loss: 0.022586 	Validation Loss: 0.097827
Epoch: 32 	Training Loss: 0.022230 	Validation Loss: 0.094719
Validation loss decreased (0.097605 --> 0.094719).  Saving model ...
Epoch: 33 	Training Loss: 0.022102 	Validation Loss: 0.098750
Epoch: 34 	Training Loss: 0.021917 	Validation Loss: 0.102691
Epoch: 35 	Training Loss: 0.021604 	Validation Loss: 0.098783
Epoch: 36 	Training Loss: 0.021364 	Validation Loss: 0.096913
Epoch: 37 	Training Loss: 0.021141 	Validation Loss: 0.101530
Epoch: 38 	Training Loss: 0.020726 	Validation Loss: 0.093346
Validation loss decreased (0.094719 --> 0.093346).  Saving model ...
Epoch: 39 	Training Loss: 0.020529 	Validation Loss: 0.093582
Epoch: 40 	Training Loss: 0.020237 	Validation Loss: 0.104659

(IMPLEMENTATION) Test the Model

Run the code cell below to try out your model on the test dataset of landmark images. Run the code cell below to calculate and print the test loss and accuracy. Ensure that your test accuracy is greater than 20%.

In [15]:
def test(loaders, model, criterion, gpGPU):

    # monitor test loss and accuracy
    test_loss = 0.
    correct = 0.
    total = 0.

    # set the module to evaluation mode
    model.eval()

    for batch_idx, (data, target) in enumerate(loaders['test']):
        # move to GPU
        if goGPU:
            data, target = data.cuda(), target.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the loss
        loss = criterion(output, target)
        # update average test loss 
        test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - test_loss))
        # convert output probabilities to predicted class
        pred = output.data.max(1, keepdim=True)[1]
        # compare predictions to true label
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
        total += data.size(0)
            
    print('Test Loss: {:.6f}\n'.format(test_loss))

    print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
        100. * correct / total, correct, total))
In [16]:
# load the model that got the best validation accuracy
model_scratch.load_state_dict(torch.load('model_scratch.pt'))
test(loaders_scratch, model_scratch, criterion_scratch, goGPU)
Test Loss: 3.039354


Test Accuracy: 24% (312/1250)

Notes

By running the test on the ignore.pt model which was trained only with 20 epochs, the accuracy was 25%. This could mean that the weight initialization plays an important role in this type of tasks. The model trained with default weights, after 25 epochs had higher validation loss. After retraining for 40 epochs, the validation went lower. However the weights initialization was surely speeding up the training while using the defaults ones gave small jumps but constant.


Step 2: Create a CNN to Classify Landmarks (using Transfer Learning)

You will now use transfer learning to create a CNN that can identify landmarks from images. Your CNN must attain at least 60% accuracy on the test set.

(IMPLEMENTATION) Specify Data Loaders for the Landmark Dataset

Use the code cell below to create three separate data loaders: one for training data, one for validation data, and one for test data. Randomly split the images located at landmark_images/train to create the train and validation data loaders, and use the images located at landmark_images/test to create the test data loader.

All three of your data loaders should be accessible via a dictionary named loaders_transfer. Your train data loader should be at loaders_transfer['train'], your validation data loader should be at loaders_transfer['valid'], and your test data loader should be at loaders_transfer['test'].

If you like, you are welcome to use the same data loaders from the previous step, when you created a CNN from scratch.

In [6]:
### TODO: Write data loaders for training, validation, and test sets
## Specify appropriate transforms, and batch_sizes


#Reusing the loaders already set
loaders_transfer = {'train': train_loader, 'valid': validation_loader, 'test': test_loader}

(IMPLEMENTATION) Specify Loss Function and Optimizer

Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_transfer, and fill in the function get_optimizer_transfer below.

In [70]:
## TODO: select loss function
criterion_transfer = nn.CrossEntropyLoss()


def get_optimizer_transfer(model):
    ## TODO: select and return optimizer
    return optim.Adam(model.fc.parameters(), lr=0.002)
    
    

(IMPLEMENTATION) Model Architecture

Use transfer learning to create a CNN to classify images of landmarks. Use the code cell below, and save your initialized model as the variable model_transfer.

In [71]:
torch. __version__
Out[71]:
'0.4.0'
In [7]:
## TODO: Specify model architecture

model_transfer = models.resnet50(pretrained=True)


#-#-# Do NOT modify the code below this line. #-#-#

if goGPU:
    model_transfer = model_transfer.cuda()
    print('Training on GPU')
Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.torch/models/resnet50-19c8e357.pth
100%|██████████| 102502400/102502400 [00:01<00:00, 95131280.76it/s]
Training on GPU
In [73]:
model_transfer
Out[73]:
ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (2): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
  )
  (layer2): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (2): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (3): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
  )
  (layer3): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (2): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (3): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (4): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (5): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
  )
  (layer4): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (2): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
  )
  (avgpool): AvgPool2d(kernel_size=7, stride=1, padding=0)
  (fc): Linear(in_features=2048, out_features=1000, bias=True)
)
In [8]:
for param in model_transfer.parameters():
    param.requires_grad = False
    
model_transfer.fc = nn.Sequential(nn.Linear(2048, 512),
                                 nn.ReLU(),
                                 nn.Dropout(0.2),
                                 nn.Linear(512,50),
                                 )

if goGPU:
    model_transfer = model_transfer.cuda()
In [21]:
model_transfer
Out[21]:
ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (2): Bottleneck(
      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
  )
  (layer2): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (2): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (3): Bottleneck(
      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
  )
  (layer3): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (2): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (3): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (4): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (5): Bottleneck(
      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
  )
  (layer4): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (downsample): Sequential(
        (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
    (2): Bottleneck(
      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
    )
  )
  (avgpool): AvgPool2d(kernel_size=7, stride=1, padding=0)
  (fc): Sequential(
    (0): Linear(in_features=2048, out_features=512, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.2)
    (3): Linear(in_features=512, out_features=50, bias=True)
  )
)

Question 3: Outline the steps you took to get to your final CNN architecture and your reasoning at each step. Describe why you think the architecture is suitable for the current problem.

Answer:

My choice was betwen VGG16 which was trained on millions of images and ResNet which I had used for previous projects and performed well. For this time I tried ResNet50 which may be an overkill but I was curious to see the results. I was interested to try out EfficientNet but since it is supported from 0.11v then I did not upgrade pytorch in the workspace. I changed the last fc layer by replacing it with a 2 linear layers.Compared to the previous network, I also decided to go for a different optimizer (Adam instead of SGD).

(IMPLEMENTATION) Train and Validate the Model

Train and validate your model in the code cell below. Save the final model parameters at filepath 'model_transfer.pt'.

In [75]:
from workspace_utils import *
In [76]:
# TODO: train the model and save the best model parameters at filepath 'model_transfer.pt'
num_epochs = 50 #Trained first with 20, retrained again with 50

with active_session():
    train(num_epochs, loaders_transfer, model_transfer, get_optimizer_transfer(model_transfer), 
                          criterion_transfer, goGPU, 'model_transfer.pt')


#-#-# Do NOT modify the code below this line. #-#-#

# load the model that got the best validation accuracy
model_transfer.load_state_dict(torch.load('model_transfer.pt'))
Epoch: 1 	Training Loss: 0.024027 	Validation Loss: 0.072269
Validation loss decreased (inf --> 0.072269).  Saving model ...
Epoch: 2 	Training Loss: 0.017629 	Validation Loss: 0.058657
Validation loss decreased (0.072269 --> 0.058657).  Saving model ...
Epoch: 3 	Training Loss: 0.015006 	Validation Loss: 0.051661
Validation loss decreased (0.058657 --> 0.051661).  Saving model ...
Epoch: 4 	Training Loss: 0.013599 	Validation Loss: 0.049485
Validation loss decreased (0.051661 --> 0.049485).  Saving model ...
Epoch: 5 	Training Loss: 0.013241 	Validation Loss: 0.048537
Validation loss decreased (0.049485 --> 0.048537).  Saving model ...
Epoch: 6 	Training Loss: 0.012771 	Validation Loss: 0.047417
Validation loss decreased (0.048537 --> 0.047417).  Saving model ...
Epoch: 7 	Training Loss: 0.012177 	Validation Loss: 0.046967
Validation loss decreased (0.047417 --> 0.046967).  Saving model ...
Epoch: 8 	Training Loss: 0.011824 	Validation Loss: 0.046299
Validation loss decreased (0.046967 --> 0.046299).  Saving model ...
Epoch: 9 	Training Loss: 0.011686 	Validation Loss: 0.046374
Epoch: 10 	Training Loss: 0.011121 	Validation Loss: 0.048451
Epoch: 11 	Training Loss: 0.011593 	Validation Loss: 0.043854
Validation loss decreased (0.046299 --> 0.043854).  Saving model ...
Epoch: 12 	Training Loss: 0.010888 	Validation Loss: 0.043518
Validation loss decreased (0.043854 --> 0.043518).  Saving model ...
Epoch: 13 	Training Loss: 0.011154 	Validation Loss: 0.044492
Epoch: 14 	Training Loss: 0.010831 	Validation Loss: 0.044983
Epoch: 15 	Training Loss: 0.010438 	Validation Loss: 0.043393
Validation loss decreased (0.043518 --> 0.043393).  Saving model ...
Epoch: 16 	Training Loss: 0.010444 	Validation Loss: 0.043540
Epoch: 17 	Training Loss: 0.010642 	Validation Loss: 0.046576
Epoch: 18 	Training Loss: 0.010276 	Validation Loss: 0.044594
Epoch: 19 	Training Loss: 0.010416 	Validation Loss: 0.044973
Epoch: 20 	Training Loss: 0.010160 	Validation Loss: 0.043717
Epoch: 21 	Training Loss: 0.010021 	Validation Loss: 0.045309
Epoch: 22 	Training Loss: 0.010111 	Validation Loss: 0.044608
Epoch: 23 	Training Loss: 0.010088 	Validation Loss: 0.043904
Epoch: 24 	Training Loss: 0.009752 	Validation Loss: 0.046380
Epoch: 25 	Training Loss: 0.009682 	Validation Loss: 0.045712
Epoch: 26 	Training Loss: 0.009558 	Validation Loss: 0.045416
Epoch: 27 	Training Loss: 0.009337 	Validation Loss: 0.044848
Epoch: 28 	Training Loss: 0.009628 	Validation Loss: 0.041322
Validation loss decreased (0.043393 --> 0.041322).  Saving model ...
Epoch: 29 	Training Loss: 0.009457 	Validation Loss: 0.045324
Epoch: 30 	Training Loss: 0.009391 	Validation Loss: 0.043611
Epoch: 31 	Training Loss: 0.009346 	Validation Loss: 0.044144
Epoch: 32 	Training Loss: 0.009252 	Validation Loss: 0.044361
Epoch: 33 	Training Loss: 0.008986 	Validation Loss: 0.045570
Epoch: 34 	Training Loss: 0.009435 	Validation Loss: 0.045846
Epoch: 35 	Training Loss: 0.009027 	Validation Loss: 0.045264
Epoch: 36 	Training Loss: 0.008917 	Validation Loss: 0.042958
Epoch: 37 	Training Loss: 0.009088 	Validation Loss: 0.046848
Epoch: 38 	Training Loss: 0.008874 	Validation Loss: 0.045416
Epoch: 39 	Training Loss: 0.008926 	Validation Loss: 0.045277
Epoch: 40 	Training Loss: 0.008844 	Validation Loss: 0.045239
Epoch: 41 	Training Loss: 0.008775 	Validation Loss: 0.047844
Epoch: 42 	Training Loss: 0.008538 	Validation Loss: 0.043959
Epoch: 43 	Training Loss: 0.008848 	Validation Loss: 0.044349
Epoch: 44 	Training Loss: 0.008653 	Validation Loss: 0.046702
Epoch: 45 	Training Loss: 0.008915 	Validation Loss: 0.043616
Epoch: 46 	Training Loss: 0.008547 	Validation Loss: 0.045123
Epoch: 47 	Training Loss: 0.008552 	Validation Loss: 0.045641
Epoch: 48 	Training Loss: 0.008940 	Validation Loss: 0.047819
Epoch: 49 	Training Loss: 0.008699 	Validation Loss: 0.046322
Epoch: 50 	Training Loss: 0.008427 	Validation Loss: 0.045696

(IMPLEMENTATION) Test the Model

Try out your model on the test dataset of landmark images. Use the code cell below to calculate and print the test loss and accuracy. Ensure that your test accuracy is greater than 60%.

In [77]:
test(loaders_transfer, model_transfer, criterion_transfer, goGPU)
Test Loss: 1.108883


Test Accuracy: 70% (885/1250)

Step 3: Write Your Landmark Prediction Algorithm

Great job creating your CNN models! Now that you have put in all the hard work of creating accurate classifiers, let's define some functions to make it easy for others to use your classifiers.

(IMPLEMENTATION) Write Your Algorithm, Part 1

Implement the function predict_landmarks, which accepts a file path to an image and an integer k, and then predicts the top k most likely landmarks. You are required to use your transfer learned CNN from Step 2 to predict the landmarks.

An example of the expected behavior of predict_landmarks:

>>> predicted_landmarks = predict_landmarks('example_image.jpg', 3)
>>> print(predicted_landmarks)
['Golden Gate Bridge', 'Brooklyn Bridge', 'Sydney Harbour Bridge']
In [9]:
model_transfer.load_state_dict(torch.load('model_transfer.pt'))
In [10]:
if goGPU:
    model_transfer = model_transfer.cuda()
In [27]:
import cv2
from PIL import Image

## the class names can be accessed at the `classes` attribute
## of your dataset object (e.g., `train_dataset.classes`)

def predict_landmarks(img_path, k):
    ## TODO: return the names of the top k landmarks predicted by the transfer learned CNN
    top_k_classes = []
    img = Image.open(img_path)
    convert_to_tensor = transforms.Compose([transforms.Resize([224,224]),
                                     transforms.ToTensor()])
    img = convert_to_tensor(img)
    img.unsqueeze_(0)
    
    if goGPU:
        img = img.cuda()
        
    model_transfer.eval()
    output = model_transfer(img)
    value, index_class = output.topk(k)
    #print(value)
    #print(index_class[0].tolist())
    #print(classes[index_class[0][0]])
    for index in index_class[0].tolist():
        top_k_classes.append(classes[index])
    
    model_transfer.train()
        
    return top_k_classes
    


# test on a sample image
predict_landmarks('data/landmark_images/test/09.Golden_Gate_Bridge/1bc7a7f05288153b.jpg', 5)
Out[27]:
['09.Golden_Gate_Bridge',
 '30.Brooklyn_Bridge',
 '38.Forth_Bridge',
 '03.Dead_Sea',
 '06.Niagara_Falls']

(IMPLEMENTATION) Write Your Algorithm, Part 2

In the code cell below, implement the function suggest_locations, which accepts a file path to an image as input, and then displays the image and the top 3 most likely landmarks as predicted by predict_landmarks.

Some sample output for suggest_locations is provided below, but feel free to design your own user experience!

In [28]:
def suggest_locations(img_path):
    # get landmark predictions
    predicted_landmarks = predict_landmarks(img_path, 3)
    
    ## TODO: display image and display landmark predictions
    img = Image.open(img_path)
    plt.imshow(img)
    plt.title('Is this picture of the {}, {} or {}?'.format(predicted_landmarks[0].split('.')[1], predicted_landmarks[1].split('.')[1], predicted_landmarks[2].split('.')[1]))
    plt.show()


# test on a sample image
suggest_locations('data/landmark_images/test/09.Golden_Gate_Bridge/1bc7a7f05288153b.jpg')

(IMPLEMENTATION) Test Your Algorithm

Test your algorithm by running the suggest_locations function on at least four images on your computer. Feel free to use any images you like.

Question 4: Is the output better than you expected :) ? Or worse :( ? Provide at least three possible points of improvement for your algorithm.

Answer:

I was expecting a higher accuracy than 70% on validation, more on the 75-80%. Especially after using ResNet50. However some possible points that could improve the performance are:

  • More data to train on
  • Different combinations of data augmentation
  • Different Neural Net, e.g. VGG16
In [37]:
## TODO: Execute the `suggest_locations` function on
## at least 4 images on your computer.
## Feel free to use as many code cells as needed.

test_pics_location = os.path.join(data_location, 'LastPart_Test')
for file in os.listdir(test_pics_location.replace('/data', 'data')):
    suggest_locations(os.path.join(test_pics_location.replace('/data', 'data'), file))
    print(f'Filename: {file}')
Filename: Stonehenge.jpg
Filename: Seattle_Jap_Garden.jpg
Filename: MachuPicchu.jpg
Filename: Yellowstone_National_Park.jpg